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In an experimental study, we explored the role of auditory perception bias in vocal 
pitch imitation. Psychoacoustic tasks involving a missing fundamental indicate that some 
listeners are attuned to the relationship between all the higher harmonics present in 
the signal, which supports their perception of the fundamental frequency (the primary 
acoustic correlate of pitch). Other listeners focus on the lowest harmonic constituents of 
the complex sound signal which may hamper the perception of the fundamental. These 
two listener types are referred to as fundamental and spectral listeners, respectively. We 
hypothesized that the individual differences in speakers' capacity to imitate F 0 found in 
earlier studies, may at least partly be due to the capacity to extract information about 
F 0 from the speech signal. Participants' auditory perception bias was determined with a 
standard missing fundamental perceptual test. Subsequently, speech data were collected 
in a shadowing task with two conditions, one with a full speech signal and one with 
high-pass filtered speech above 300 Hz. The results showed that perception bias toward 
fundamental frequency was related to the degree of Fq imitation. The effect was stronger 
in the condition with high-pass filtered speech. The experimental outcomes suggest 
advantages for fundamental listeners in communicative situations where Fq imitation is 
used as a behavioral cue. Future research needs to determine to what extent auditory 
perception bias may be related to other individual properties known to improve imitation, 
such as phonetic talent. 
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INTRODUCTION 

Due to a plethora of linguistic and social functions, vocal pitch 
imitation plays a central role in human interaction. In language 
use, pitch, the perceptual correlate of fundamental frequency (Fq) 
typically located between 50-500 Hz in human speech signal, 
encodes linguistic information regarding speech act and sentence 
types (Nilsenova, 2007), information structure, and, in many lan- 
guages, lexical meanings (Ladd, 1996). Pitch imitation arguably 
accelerates acquisition of these linguistic functions because it 
is faster than a individual, i.e., trial-and-error based, discovery 
(Meltzoff et al., 2009). Imitation of phonetic features has also 
been found to improve speech comprehension (Adank et al., 
2010). Listeners who mimicked a novel pronunciation of a sen- 
tence improved their subsequent speech reception thresholds for 
the sentence in a condition with background noise. Next to its 
linguistic functions, pitch is also the most important vocal source 
of information regarding emotions, stands and attitudes of the 
speaker (Juslin and Laukka, 2003; Ververidis and Kotropoulos, 
2006). The Fq region provides acoustic information for imitation 
exploited in promoting social convergence and status accommo- 
dation (Gregory and Hoyt, 1982; Gregory, 1983; Gregory et al., 
1993; Gregory and Webster, 1996; Gregory et al, 1997; Haas and 
Gregory, 2005; Pardo, 2006) and expressing ingroup-outgroup 
bias (Babel, 2009; Pardo et al., 2012). Speakers who are perceived 
as attractive, likable and/or dominant influence listeners' pitch 
output, and pitch convergence can be seen as an indicator of 
cooperative behavior in communication dyads (Nilsenova and 



Swerts, 2012; Okada et al., 2012). Pitch divergence, on the other 
hand, suggests that speakers may wish to be viewed as dissimilar 
and increase social distance between themselves (Giles, 1973). The 
capacity to perceive the fundamental frequency in the speech sig- 
nal correctly and to adapt one's own pitch production according 
to one's linguistic and social goals is thus a core communicative 
skill (Giles and Coupland, 1991). 

The results of a range of experimental studies suggest that 
speakers effortlessly imitate and converge to the phonetic proper- 
ties of recently heard speech (Natale, 1975; Shockley et al, 2004; 
Pardo, 2006; Delvaux and Soquet, 2007; Gentilucci and Bernardis, 
2007; Nielsen, 201 1), including pitch (Goldinger, 1998; Babel and 
Bulatov, 2012; Gorisch et al., 2012). However, as noted by Babel 
and Bulatov (2012), in the context of the standard shadowing 
paradigm, large individual differences can be found in the degree 
of pitch imitation — with only some participants actually converg- 
ing to the Fq of the model talker (Babel and Bulatov, 2012, p. 240). 
The proposal of our study is that individual variation in the imi- 
tation of pitch is, at least partly, due to basic acoustic perceptual 
mechanisms that also influence pitch production. 

Most speech imitation studies assume that there exist few indi- 
vidual differences among healthy hearing subjects with respect to 
the low-level processing of speech signal. However, past psychoa- 
coustic research involving stimuli with a missing fundamental 
indicated that there is a difference between two auditory per- 
ceptual extremes, sometimes referred to as analytic and holis- 
tic/synthetic listeners (von Helmholtz, 1885; Smoorenburg, 1970; 
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Houtsma, 1979), henceforth referred to as spectral and funda- 
mental listeners, respectively. Spectral listeners primarily focus 
on the individual harmonic constituents, they "decompose the 
sound" (Schneider and Wengenroth, 2009, p. 316), while fun- 
damental listeners are attuned to the relationship between all 
the higher harmonics present in the signal, which supports their 
perception of the fundamental frequency (Rousseau et al., 1996; 
Laguitton et al., 1998; Seither-Preisler et al., 2007). According to 
von Helmholtz (1885), for fundamental listeners, it is as if the har- 
monics "fuse into the whole mass of musical sound" (Schneider 
and Wengenroth, 2009), hence his choice of the term "holistic" 
or "synthetic" to refer to this type of listening mode. While in 
practice, few listeners perform uniquely at the absolutes of one or 
the other type (Ladd et al, 2013), the perceptual bias may lead 
to different interpretations of perceived pitch values in particu- 
lar contexts. On the one hand, the perception of the fundamental 
frequency is supported by so-called combination tones generated 
in the cochlea (Plomp, 1976). These tones differ across individ- 
ual listeners (Probst et al., 1986). On the other hand, results of 
structural MRI studies suggest that the bias is, at least partly, 
due to a right-/leftward asymmetry of gray matter volume in 
the lateral Heschl's gyrus (Schneider et al, 2005a,b; Wong et al., 
2008), the so called "pitch processing center" (Griffiths, 2003). 
In particular, larger volumes of right Heschl's gyrus seem to be 
associated with spectral perceptual bias, while the left Heschl's 
gyrus has been linked to changes in the Fo modulation and tem- 
poral information (Schneider et al., 2005a,b; Warrier et al, 2009). 
Until fairly recently, the perceptual bias has mainly been exam- 
ined in the context of musical psychoacoustics. The research 
outcomes of Wong et al. (2008), however, may be interpreted as 
support for the claim that it may also affect linguistic perfor- 
mance. In their study of lexical tone perception, listeners who 
performed worse in a word identification task involving vowels 
with superimposed tones showed a smaller Heschl's gyrus volume 
on the left than listeners who performed better. Given the tight 
link between perception and production, recently implemented 
in the "forward-model" of Pickering and Garrod (2013) where 
internal simulation of input utterances facilitates comprehension 
and shapes phonetic output, we assume that advantages in the 
perception of Fo might improve its imitation. In other words, fun- 
damental listeners may have a better capacity to adapt their pitch 
to their communication partners than spectral listeners. 

In what follows, we present the results of a production study 
conducted to determine the effect of auditory perception bias on 
automatic pitch imitation in a classical shadowing task. Listeners' 
perception bias was determined with the help of missing funda- 
mental stimuli, an idea that originated with Smoorenburg (1970) 
who introduced a forced-choice task involving sequences of two 
complex tones. In the task, participants are presented with a 
sequence and asked to indicate if the perceived pitch is rising or 
falling. The crux of the task is that the tone sequence is designed 
to have an ambiguous pitch change. Each complex tone is cre- 
ated from m partials F„,F n + 1, ,. .F n + m -i, (n is an integer, 
n > 0), without the fundamental Fo- The ambiguity arises from 
the opposite changes of the (missing) fundamentals (Fo) and the 
(physically present) lowest partials (F/p). When the subsequent 
fundamentals Fq are rising, the lowest partials F/ p are falling, 



and vice versa. Representing the partials of the first and second 
tones by F 1 and F 2 , respectively, fundamental listeners will per- 
ceive the change in pitch APf by computing APf = (F 2 + { — 
F 2 ) - (F* + j — F^) (k 6 {n, n + 1, . . .n + m - 2}) in order to 
estimate F 2 , — Fq . Spectral listeners will rely on AP sp = Fl — Fj p 
to determine if the pitch is rising or falling. Figure 1 illustrates an 
ambiguous tone sequence. The sequence depicted has a falling Fo 
(APf < 0) and a rising F /p (AP sp > 0). 

Since the early work of Smoorenburg (1970), the listener type 
task has been frequently employed to study how acoustic variables 
(e.g., Fo-value, AF-value, number of partials) affect the percep- 
tion of pitch (Plomp, 1965; Ladd et al., 2013). Schneider et al. 
(2005b) and Seither-Preisler et al. (2007) used the task to explore 
the distribution of listener types in relation to musical training. 
In both studies, participants were presented with a number of 
ambiguous-sequence stimuli. The proportion of stimuli to which 
a fundamental or spectral pitch change was perceived by the par- 
ticipants, defined the so-called Coefficient of Sound Perception 
Preference' (hp), a value ranging from —1 (all stimuli perceived as 
spectral) to +1 (all stimuli perceived as fundamental). To prevent 
the emergence of combination tones that arise at the level of the 
cochlea (Terhard, 1974), Schneider et al. (2005b) presented tones 
at low intensity and Seither-Preisler et al. (2007) added mask- 
ing noise to their stimulus sequences. Given that the perception 
thresholds of combination tones vary with the individual (Plomp, 
1976), following Ladd et al. (2013), we made use of stimuli with- 
out masking in an attempt to include possible effects of cochlear 
mechanisms on the perception and production of pitch in speech. 

MATERIALS AND METHODS 
PARTICIPANTS 

Eighty-eight Dutch native speakers (67 females) between the age 
of 17-25 years (M = 20.48, SD = 2.12) participated in the exper- 
iment for course credit. None of them reported any hearing 
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FIGURE 1 | Illustration of an ambiguous two-tone sequence to 
determine auditory perception bias. The sequence has a falling (missing) 
fundamental Fo and a rising lowest partial F tp The horizontal lines 
represents the partials of the tone. The higher partials are physically 
present (solid lines) and the lower partials are missing (dashed lines). 
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difficulties. Fourteen of the participants were left-handed; about 
one half of the experimental group described their musical profi- 
ciency as low to average, the other half assessed their proficiency 
as high to professional. Male and female were divided equally 
between the two experimental conditions. Prior to the experi- 
ment which had received an approval from the ethical committee, 
participants provided their written informed consent. 

MEASURING AUDITORY PERCEPTION BIAS 

Participants' auditory perception bias was determined with a 
variation of the psychoacoustic perceptual test described in 
Smoorenburg (1970), Laguitton et al. (1998), Schneider et al. 
(2005b), and Seither-Preisler et al. (2007). For the perceptual 
test, we constructed 36 pairs of complex harmonic tones, all 
160 ms long, that consisted of 2-4 harmonics, with the same 
harmonic composition as employed by Laguitton et al. (1998). 
Participants were asked to categorize 18 perceptually ambiguous 
stimuli sequences consisting of two complex tones, tone 1 and 
tone 2 as illustrated in Figure 1. All tones were composed of a 
number of upper harmonic tones with the same highest harmonic 
but different levels of virtual fundamental pitch (derived from the 
harmonics as the best fit) and spectral pitch (based on the low- 
est harmonic). The other 18 stimuli served as control trials in 
that their interpretation was unambiguous but helped to deter- 
mine a participant's level of attention to the task. Listeners were 
instructed to categorize each experimental stimulus (tone pair) as 
either "rising" or "falling," depending on their perception of the 
sequence. Based on their answers, we calculated their individual 
"Coefficient of Sound Perception Preference" (h p ) using the equa- 
tion h p = (F — Sp)/(F + Sp), where F is the number of virtual 
fundamental classifications and Sp the number of spectral classi- 
fications. We calculated the "Listener Attention Coefficient" (8^) 
as the proportion of correctly categorized unambiguous stimuli. 
In order to test the validity of the perceptual test, we repeated 
the measurement approximately 1 month later under the same 
conditions with a subset of the participant set (N = 64). In the 
analyzes presented below, we report the overall results for all 
experimental stimuli (h p ), as well as the results for stimuli where 
the lowest present component frequency F n > 1000 Hz, Spiooo- 
The 1000 Hz value is arguably the highest frequency at which Fq 
could be produced by a human voice and also the approximate 
maximal value at which the missing fundamental phenomenon 
occurs (Fletcher, 1924). Stimuli with F„ > 1000 Hz thus arguably 
support the perception of the missing fundamental. 

SPEECH IMITATION TASK 

The shadowing task took place immediately after the psychoa- 
coustic task. It consisted out of eight declarative and eight inter- 
rogative sentences uttered by four different model talkers (two 
male, two female) in a between-subject design in order to max- 
imize exposure to the model speaker's voice and thus increase 
chances of possible imitation. The 16 sentences were recorded 
four times: in the first and fourth block, the participants read the 
sentences in a randomized order (same for all participants) from 
a PowerPoint slide; the declarative and interrogative sentences 
were presented in a mixed design. In the second and third block, 
they were asked to repeat the sentences as they were presented 



to them (in auditory modus only), through high quality head- 
phones (Sennheiser HMD 2 6-600-7). The participants were not 
explicitly instructed to imitate the speakers' pronunciation but 
simply to repeat the utterances. They were randomly assigned to 
one of two between-subject conditions (filtered vs. unfiltered). In 
the filtered condition, participants heard recordings that were fil- 
tered with an order nine high-pass Butterworth filter with cutoff 
frequency of 300 Hz, using Matlab's Signal Processing Toolbox. 
The participants in the unfiltered condition heard full speech 
recordings. 

AUTOMATIC PITCH ESTIMATION 

An initial set of analyzes was performed on the whole corpus 
with a subsequent more detailed analysis of a shorter speech 
segment. The recordings were segmented per utterance and ana- 
lyzed using the autocorrelation method, see, e.g., Rabiner et al. 
(1976), implemented in Matlab using a frame length of 10 ms 
with 5 ms overlap, and a frequency range of 50-500 Hz. For the 
whole corpus, we computed five statistical descriptors of Fq: the 
mean value, the maximum, the minimum, the range (max-min) 
and the standard deviation. The degree of Fq imitation was deter- 
mined by assessing the z-score of the absolute difference between 
the model speaker's Fq descriptor and the participant's Fq descrip- 
tor in the first block {D\, baseline) and the second and third 
block (first and second shadowing, D2 and D3, respectively). We 
defined two measures of imitation, Fq Imitationi = D\ — D2 and 
Fq Imitation = D\ — D3 . The statistical analyses were conducted 
with the IBM SPSS Statistics software v.2.0. 

RESULTS 

In this section we present the results of our experiment in four 
parts. First, we present the descriptive values of the "Coefficient of 
Sound Perception Preference" hp in the first and second measure- 
ment. Second, all results obtained in the first measurement are 
compared to global — sentence level — imitative behavior. Third, 
using smaller speech segments, a correlation analysis is performed 
on the psychoacoustic and socio-demographic variables to deter- 
mine the inclusion of variables in a regression analysis. Finally, the 
results of a hierarchical multiple regression analysis are presented 
that relate auditory perception bias to Fq imitation. 

COEFFICIENT OF SOUND PERCEPTION PREFERENCE 

The Shapiro-Wilks test of normality revealed that the coefficient 
hp was not normally distributed: the majority of the participants 
performed as fundamental listeners (Mean hp = 0.397, SD = 
0.406). For a distribution of the hp, see Figure 2. A comparison 
of the first and the second measurement showed that repeated 
exposure to the ambiguous stimuli resulted in a shift toward 
the fundamental bias, with a significant correlation between 
the two measurements (Spearman's p = 0.69, p < 0.001). The 
test-retest correlation was comparable to that provided by Ladd 
et al. (2013). The difference between the two measurements was 
marginally significant with Wilcoxon Signed Ranks test, Z = 
— 1.87, p = 0.06. In order to explore the possibility that the 
difference between the first and the second measurement of par- 
ticipants' perception was due to the level of attention devoted to 
the task, we compared the absolute difference between the first 
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and the second 8„ 



and 8p2, 



to the attention coefficient 8^. 



The correlation between the attention coefficient 8^ in the first 
measurement and the [ 8pl — 8p2 | was significant (Spearman's 
p = —0.35, p < 0.01), indicating that poor attention to the task 
during the first measurement may have been the reason for the 
observed shift in 8p (given that the shift was in the direction from 
"undecided" to a more "pure" type of perception, see Figure 2). 
As pointed out by Seither-Preisler et al. (2007), however, who 
reported a similar result attributed to repeated exposure, an effect 
due to learning cannot be excluded (no measures of 8a were pro- 
vided in their study). In the subsequent analysis relating speakers' 
perceptual bias to their capacity to imitate Fn, we used the value of 
hp collected during the first measurement, i.e., in the same session 
as the shadowing task. 

SENTENCE-LEVEL IMITATION 

In the initial global analysis of the whole corpus, for each 
pitch value, we conducted two statistical tests, one with the full 



participant sample and one where we only included those par- 
ticipants who had more than 90% correct (less than 2 mistakes 
of the total of 18 trials) in the categorization of the unambigu- 
ous stimuli in the psychoacoustic task (N = 41 in total, with 
Nfull s ig na i = 22 and iV n it ere d = 19) and were thus assumed to be 
reliable as listeners. Tables 1, 2 give an overview of the correla- 
tions between Fq imitation (rows expressed in Hz) for the five 
descriptors (columns) and the Coefficient of Sound Perception 
Preference, 8p, split by condition (with full signal, viz. Table 1, 
and with the signal under 300 Hz filtered out, viz. Table 2). 

The results of the global analyses suggested that, overwhelm- 
ingly, participants who performed reliably on the non-ambiguous 
task and scored higher in the direction of fundamental listeners 
imitated the model speakers' pitch to a higher degree, espe- 
cially in the condition with filtered speech signal. Given that 
the analyses were performed on full utterances, however, they 
might have been less likely to capture Fg imitation that typi- 
cally occurs on individual segments (especially, vowels) and less 




Mean ■ .40 
Std. Dev. = .406 
N = 88 




Mean = .46 
Std. Dev. = ,369 
N - 64 



Session 1 (-l = spectral, +l=fundamental 
FIGURE 2 | Distribution of 5 P during the first session and second session of the psychoacoustic task 
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Table 1 | Pearson product-moment correlations between 8 P and F 0 imitation in the full signal condition (first value for all participants, second 


value for participants with PCiA > 90). 










Variable 


Mean Fo 


F 0 Max 


Fo Min 


F 0 Range 


F 0 SD 


Fo Imitationi 


-0.06/-0.17 


-0.08/0.043 


-0.23*/-0.15 


0.30*/0.37* 


0.20*/0A-\* 


Fq Imitation 


-0.01/-0.16 


0.11/0.16 


-0.18/-0.07 


0.23*/0.33 


0.14/0.39* 



PC= "% correct" 
*p < 0.10, *p < 0.05. 



Table 2 | Pearson product-moment correlations between 8 P and Fo imitation in the filtered condition (first value for all participants, second 
value for participants with PCs a >90). 



Variable 


Mean Fo 


F 0 Max 


F 0 Min 


F 0 Range 


F 0 ,SD 


Fo Imitationi 


0.03/0.37 


-0.07/0.44* 


-0.03/0.06 


0.17/0.33* 


0.22*/0.33 


Fq lmitation2 


-0.02/0.43* 


-0.23*/0.42* 


-0.27V-0.07 


-0.09/0.40* 


-0.04/0.34 



PC= "% correct." 
*p < 0.10, *p < 0.05. 



Frontiers in Psychology | Cognitive Science 



November 2013 | Volume 4 | Article 826 | 4 



Postma-Nilsenova and Postma 



Auditory perception bias 



reliable given that local minima and maxima (that, in turn, 
affect the range and SD) may be outliers in the signal with- 
out communicative significance. Therefore, we proceeded with 
a more fine-grained analysis of a subset of the corpus, in which 
we also included socio-demographic variables collected in the 
experiment. 

VOWEL SEGMENT IMITATION 

In order to limit the size of the corpus collected in the shadow- 
ing task, we randomly selected one of the interrogative sentences 
for the subsequent analyses, focusing on its initial voiced segment. 
The choice of an interrogative sentence was driven by the assump- 
tion that ( 1 ) imitation is likely to occur at sentence-initial bound- 
aries immediately following the model talker's output (Nilsenova 
and Nolting, 2010), and (2) polar (yes/no-) interrogatives that are 
context-free (no particular word in the interrogative is in focus) 
are intonationally marked by a pitch excursion (van Heuven and 
Haan, 2002), i.e., in this case, on the finite verb that is sentence- 
initial due to subject-verb inversion. An automatic analysis of 
pitch was performed on the initial occurrence of the vowel /a/ 
in the sentence. The segment fundamental frequency was deter- 
mined by averaging over the Fn values of approximately the first 
half of the initial vowel in order to avoid right vowel boundary 
detection errors. 

Preliminary data analysis was conducted to identify potential 
covariates, using both demographic and psychoacoustic variables. 
Chi-square tests indicated that there were no significant differ- 
ences between the full speech and high-pass filtered condition 
with respect to participant gender and handedness, there was also 
no significant difference between stimulus voice (two male, two 
female) and participant gender. Non-parametric Mann- Whitney 
Tests for variables without normal distribution indicated no sig- 
nificant difference between the experimental conditions with 
respect to musicality [determined on the basis of a self-reported 
evaluation on an 11 -point scale, anchored at 0 (no experi- 
ence) and 10 (professional musician)], age, 8 p (sound perception 



preference), 8a (listener attention) and Spiooo (sound perception 
preference for stimuli above 1000 Hz). A zero-order correlation 
analysis assessed the relationship between demographic and psy- 
choacoustic variables. The purpose of the matrix was to deter- 
mine which variables might affect degrees of imitation and could 
thus be included in the regression analysis. As seen in Table 3, 
there was a significant correlation between musicality and 8p 
(r = 0.51, p < 0.001), 8 A (r = 0.49, p < 0.001) and 8 p i 00 o (r = 
0.46, p < 0.001); participants with more musical experience per- 
formed with a more fundamental perceptual bias with respect 
to stimuli with a missing fundamental and scored higher on 
categorizing non-ambiguous acoustic stimuli as well. There was 
also a significant correlation between 8p and S A (r = 0.47, p < 
0.001) and Spiooo and 8 A (r = 0.39, p < 0.001), more fundamen- 
tal perceptual bias was related to a better performance on the 
non-ambiguous stimuli. The two ways of assessing auditory per- 
ception bias, 8p and 8piooo> were significantly correlated (r = 0.94, 
p < 0.001). A trend for significance was observed in the relation 
between the first Fn imitation and the experimental condition 
and between gender and the second Fn imitation (significant 
with a < 0.10). In addition to the correlation tests, we also 
explored the effect of the categorical variables (Condition, Gender 
and Handedness) on the measures of the Listener Attention 
Coefficient, the Coefficient of Sound Perception Preference, the 
Coefficient of Sound Perception Preference above 1000 H, Fq 
Imitationi (first shadowing block) and Frj Imitation (second 
shadowing block). Gender and handedness had no effect on any 
of the measures. There was a marginally significant effect of con- 
dition on Fq Imitationi (£(86) = —1.81, p = 0.07) with a lower 
degree of imitation in the filtered condition compared to the 
full speech condition. There were no other significant effects of 
condition. Based on the results of the correlation analyses which 
suggested a stronger link between Spiooo and imitation, only Spiooo 
was included as a covariate in the primary statistical modeling of 
the first Fn imitation (first shadowing, i.e., second block in the 
session) in the two experimental conditions. 



Table 3 | Zero-order Pearson product-moment correlations among psychoacoustic variables and the socio-demographic variables. 



Variable 12345 6 7 89 10 



1. 


Condition 
















2. 


Age 


-0.02 














3. 


Gender 


0.08 


-0.14 












4. 


Handedness 


-0.06 


0.01 


0.12 










5. 


Musicality 


-0.07 


0.18 


0.07 


-0.18 








6. 


Sa 


0.06 


0.07 


-0.07 


-0.04 


0.49** 






7. 


S P 


-0.09 


0.11 


0.04 


0.09 


0.51** 


0.47** 




8. 


SplOOO 


-0.10 


0.06 


0.01 


0.04 


0.46** 


0.39** 


0.94** 


9. 


Fo Imitationi 


0.19* 


-0.03 


0.11 


0.12 


-0.03 


0.05 


0.17 0.24* 


10. 


Fq lmitation2 


0.15 


-0.05 


0.18* 


0.10 


0.00 


0.06 


0.17 0.18 0.62** 



"Condition" was dummy-coded to compare the effect of frequency filtering with other responses (1 = filtered, 0 = full speech). "Gender" was dummy-coded to 
compare the performance of male and female listeners (1 = female, 0= male). "Handedness" was dummy-coded to compare the performance of left- and right- 
handers (1 = right, 0= left). &a, Listener Attention Coefficient; S p , Coefficient of Sound Perception Preference; S p1 ooo, Coefficient of Sound Perception Preference 
above 1000 Hz. 

*p < 0. 10, *p < 0.05, **p < 0.001. 
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Hierarchical multiple regression was used to establish the 
incremental value of auditory perception bias when predicting the 
level of Po imitation in a condition with high-pass band filtered 
speech and in a condition with full speech signal. The regres- 
sion model consisted of two blocks and assessed the additional 
variance explained with the estimation of each added block. At 
Block 1, the centered values of opiooo and experimental condition 
were entered simultaneously. This block resulted in a signifi- 
cant overall model, Fq 85) = 4.87, p = 0.01, accounting for 10% 
of the variance in the imitation scores. The interaction effect 
between Spiooo and experimental condition was created by multi- 
plying the mean-centered values of each individual variable and 
then was entered at Block 2 along with all variables entered at 
Block 1. Results again indicated an overall effect for the model, 
_F (3 84 j = 3.27, p = 0.03, explaining an additional variance of 
0.2%. The opiooo by experimental condition interaction term did 
not significantly predict the imitation scores after controlling for 
covariates and main effects (b = —8.02, p = 0.69). Figures 3, 4 
graphically display the main effects of Spiooo and condition on 
Fn imitation. The y-axes express the difference between D\, the 
absolute difference between the model speaker's Fq and the par- 
ticipant's Fq in the first (baseline) block, and Di, the absolute dif- 
ference between the model speaker's Fq and the participant's fn in 
the second (first shadowing) block; a positive value here indicates 
imitation and a negative value indicate divergence. The figures 
show that more fundamental listeners were better at imitating the 
fundamental frequency in the model speaker's voice. Fully tabu- 
lated results of the hierarchical regression model are presented in 
Table 4. 
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FIGURE 3 | The relation between auditory perception bias above 
1000 Hz and the degree of F 0 imitation in the condition with full 
speech signal (dotted line indicating trend). The x-axis represents 
the auditory perception bias expressed as 8 P (1000). The y-axis 
expresses the difference between D-\ , the absolute difference 
between the model speaker's Fo and the participant's Fo in the first 
(baseline) block, and D2, the absolute difference between the model 
speaker's Fo and the participant's Fo in the second (first shadowing) 
block; a positive value here indicates imitation and a negative value 
indicate divergence. 



DISCUSSION 

Our findings suggest that auditory perception bias can partly 
account for the individual variation found in earlier pitch 
imitation studies. In a shadowing task, fundamental listeners 
showed a better capacity to imitate the vocal pitch of the model 
talkers, especially in a condition where the region between 0- 
300 Hz has been filtered out and information about Fo had 
to be derived from the higher frequencies (akin to telephone 
speech). These results can be used in future studies on speech 
imitation abilities, e.g., to explore phenomena such as phonetic 
(pronunciation) talent (Lewandowski, 2009). 

Our findings of individual differences in listener's sensitiv- 
ity to tone sequences may be related to those of Semal and 
Demany (2006), who found some listeners to be able to detect 
changes in tone sequences, while unable to indicate the direc- 
tion of change (upward or downward). Future studies should 
address the relation between individual differences in sensitiv- 
ity to pitch direction and in auditory perception bias. At this 
point it is unclear what is causing the individual differences in 
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FIGURE 4 | The relation between auditory perception bias above 


1000 Hz and the degree of F 0 imitation in the condition with filtered 


signal (dotted line indicating trend). The x- and /-axis represent the 


same measures as in Figure 3. 







Table 4 | Results of the hierarchical regression model. 

Variable b SE p Adjusted ff A/? 2 

Step 1 0.08 0.10** 

5 p iooo 26.07 10.02 0.26** 

Filter condition 17.46 8.33 0.22* 



Step 2 0.07 0.00* 

Sp-,000 24.79 10.10 0.26* 

Filter condition 17.44 8.37 0.22* 

Spiooo by filter -8.02 20.19 -0.04 
condition 



Spiooo, coefficient of sound perception preference above 1000 Hz. 
*p< 0.05, **p< 0.07. 
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auditory perception bias. As stated in the Introduction, Schneider 
et al. (2005b) found neuroanatomical differences in the lateral 
Heschl's gyrus to be associated with perception bias. However, 
the differences may very well be of a more peripheral origin, 
i.e., reflecting individual differences in cochlear responses. In 
particular, non-linear interactions in the cochlea may give rise 
to so-called combination tones (Plomp, 1965). When stimu- 
lated with a tone consisting of the n-th and (« + l)th harmonic, 
the cochlea may generate tones at a frequency corresponding to 
that of the missing fundamental. It is important to stress that 
the generated tone is physically present because it is generated 
in the cochlea, rather than being extracted from the harmon- 
ics (as is the case for the missing fundamental). Plomp (1965) 
claimed that combination tones are inaudible for "usual levels" 
of speech and music and that the same applies to the perception 
of the missing fundamental. Notwithstanding this claim, in his 
study of individual differences in (what we call) auditory per- 
ception bias, Smoorenburg (1970) effectively suppressed the per- 
ception of combination tones by superimposing masking noise 
bands centered at the combination-tone frequencies. Apparently, 
Smoorenburg (1970) was concerned about a potential interfer- 
ing effect of combination tones in the determination of listener 
type. Given that in the experiment reported here, the stimuli 
were presented without masking noise, the participants may have 
perceived physically generated tones at the level of the miss- 
ing fundamental. The generation of combination tones could 
have lead to overestimates of 8 p , because spectral listeners may 
perceive the combination tone instead of a reconstructed funda- 
mental (as fundamental listeners do), thus explaining the skewed 
distribution in both first and second measurement of the percep- 
tion bias. On the one hand, the presence of combination tones 
may invalidate the determination of listener type. On the other 
hand, combination tones are an inevitable byproduct of natu- 
rally occurring sounds. Cochlear dynamics generate combination 
tones which affect further cortical processing and anatomical 
correlates (i.e., lateral Heschl's gyrus). As such, the auditory per- 
ception bias as measured in our experiment takes into account 
individual variations in sensitivity to combination tones. In gen- 
eral, the potential role of combination tones in the definition 
and study of listener types deserves further attention. Ladd et al. 
(2013) pointed at the methodological differences in earlier studies 
of listener type performed by Schneider et al. (2005b) and Seither- 
Preisler et al. (2007), but did not identify the use of masking 
noise (or other means to suppress combination tones) as a main 
methodological difference between their study and both earlier 
ones. In our future work, we aim at a detailed investigation of the 
role of combination tones in auditory perception bias. 
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